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THE APPROXIMATE SOLUTION OF A SIMPLE CONSTRAINED SEARCH PATH 
MOVING TARGET PROBLEM USING MOVING HORIZON POLICIES 

Presented here are the results of applying moving horizon 
policies to solve approximately a moving target problem, where 
both the searcher and the target have constraints on their paths. 
The solution procedure can be viewed as an approximation of the 
optimal dynamic programming method of Eagle (1982) . This approx- 
imation may be useful if limits on available computer storage 
or computer time do not allow calculation of the optimal solution. 

Only one problem geometry was examined. The problem was 
selected to keep the computer computations feasible rather than 
to be representative of any real-world search. It is possible 
that the patterns observed in the solution are specific to this 
problem geometry. Further work is required to establish the gen- 
erality (or lack thereof) of these results. 

1. The Problem 

The target and searcher both move in discrete time among the 
9 cells shown in Figure 1. The searcher starts in cell 1, and 
the target starts in cell 9. In each time period the searcher 
can move from his current cell to any adjacent cell. Cells are 
adjacent if they share a common side. The searcher can also 
choose to remain in his current cell. The target moves from cell 
to cell according to a specified Markov transition matrix. The 
probability of the target remaining in any cell i, given it was 
in cell i in the previous time period, is .4. The probability 
that the target transitions to any cell adjacent to i is . 6/c^, 
where c^ is the number of cells adjacent to i. So the target 
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transition matrix is 
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If the searcher chooses the cell occupied by the target, then the 
target is detected with probability .5. If the searcher chooses 
a cell not occupied by the target, then the target can not be 
detected during that time period. The searcher has T time periods 
in which to search. His problem is to select that T-time period 
search path which minimizes the probability of target non-detec- 
tion (PND) . 
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Figure 1. 9-cell search grid. 
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2. Moving Horizon Policies 



The problem presented was solved approximately using m-time 
period moving horizon (m-TPMH) policies . Such a policy is 
defined as follows: When T time periods remain in which to 

search and T > m, the m-TPMH policy selects as the next search 
cell that cell which would be optimal if m time periods remained 
in the problem. When T < m, the optimal search path is selected. 
The 1-TPMH policy is called the myopic policy . 

Moving horizon policies were introduced for the Markov deci- 
sion process by Shapiro (1969) and have been recently suggested 
for search applications by Stewart (1984) . 

For this investigation, dynamic programming was used to con- 
struct the (m+l)-TPMH policy from the m-TPMH policy. The details 
are in Appendix A and Eagle (1982) . 
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3. Experimental Results 

A total of 320 cases were examined using problem lengths T 
(T=l,2, . . . ,40) and m-TPMH policies (m=l , 2 , . . . , 8 ) . In addition, 
the optimal solutions were obtained (using dynamic programming 
and total enumerication) for T from 1 to 15 time periods. Figures 
2 through 7 illustrate some observations suggested by the data 
collected. 

Observation 1 : For the moving horizon and optimal policies 

examined, the decrease in PND with increasing T was "almost 
asymptotically geometric." 

Figures 2 through 6 illustrate "almost." In Figure 2, PND 
is plotted on a logarithmic scale against T. It appears here 
that PND for the myopic solution, the 8-TPMH solution, and the 
optimal solution are very nearly asymptotically geometrically 
decreasing. It is also apparent that the 8-TPMH policy generates 
a PND which decreases more rapidly than that generated by the 
myopic policy. Figures 3 and 4 show, however, that there is 
some fine structure in the graphs of PND which is not apparent in 
Figure 2. In Figure 3, the ratio PND (T) /PND (T-l) is plotted for 
the myopic and 8-TPMH policies. Figure 4 is a similar plot with 
an expanded y-axis scale. It appears that while the myopic 
policy is asymptotically geometric, the 8-TPMH policy is not. 
Graphs of PND (T) /PND (T-l) for the other moving horizon policies 
tested show an "almost asymptotically geometric" pattern similar 
to that of the 8-TPMH policy, (See Figures 5 and 6.) 

Observation 2 : It is possible for an m^-TPMH policy to produce 

a smaller PND than a n^-TPMH policy when m^ < m 2 . 
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MARKS OPTIMAL SOLUTION 
(SOLVED FOR TIME PERIODS 1 - 15 ) 
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T (NUMBER OF TIME PERIODS IN PROBLEM) 

Figure 3. PDN (T) /PND (T-l) for Myopic and 8-TPMH Policies. 
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T (NUMBER OF TIME PERIODS IN PROBLEM) 

Figure 4. PND (T)/PND (T-l) for Myopic and 3-TPMH Policies. 
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Figure 6. PND (T) /PND (T-l) for m-TPMH Policies (m = 6, 7) and Optimal Policy. 



In general, m-TPMH policies performed better as m increased 
from 1 to 8, but there were some exceptions. Figure 7 illustrates. 
Here the difference in PND produced by the 3- and 4-TPMH policies 
is plotted against problem length T. A negative value of this 
difference indicates that the 3-TPMH policy performed better than 
the 4 TPMH policy for that particular value of T. For example, 
for T=ll, the 3-TPMH policy produced a PND of .4426, while the 
4-TPMH policy gave .4434. The difference of -.0008 is plotted 
in Figure 7. 

Observation 3 : For T < 15, the optimal and 8-TPMH policies 

produced identical PND . 

This is not to suggest that the 8-TPMH policy is optimal (It 
is not optimal.- The 6-TPMH policy produced smaller values of 
PND for some T.), but rather that it may be a good approximately 
optimal policy for this problem. 
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Figure 7. The Difference in PND Produced by 3-TPMIi and 4-TPMH Policies. 



4 . Looking for a Lower Bound to PND 

Moving horizon policies provide an upper bound to the optimal 
PND. It would be useful to construct a lower bound as well. If 

A 

for all T greater than or equal to some T, the optimal policy 
produced a non-decreasing PND (T) /PND (T-l) (as does the myopic 






policy in this example for T = 3) , then 



PND (T) > PND (T) 



/ PND (T) \ 

\ PND (T-l) / 



(T-T) 



for all T > T. Unfortunately, the optimal policy in this example 
did not generate non-decreasing PND (T) /PND (T-l) . (See Figures 
4 and 6 . ) The strongest statement about the optimal PND that 
the data collected can support is apparently the following: 

For all T € (1,2,..., 15) there exists a maximum y(T) > 0 
satisfying 



T 



< T < 15 =s- 



PND(T) 

PND (T-l) - 



y(T) 



That is, for each T, there was some maximum positive constant, 

Y (T) , which defined the tightest geometrically decreasing lower 
bound to PND (T) , T > T. 

In addition, the data allow the following additional obser- 
vation concerning the moving horizon PND. 

Observation 4 : For the m-TPMH policies examined with T > 10, 



PND (T) ^ PND (10) 

PND (T-l) - PND ( 9 ) 



That is, for T > 10, the 1-time period geometric decrease in the mov 
horizon PND (T ) was bounded below by PND (10) /PND (9) . If this 
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observation also holds for the optimal policy, then for T > 15 



we have for the optimal policy 



PND (T) = PND (15) 



PND (16) PND (17) 
PND (15) PND (16) 



PND (T) 

' * PND(T-l) 



> PND (15) 




PND (10) \ (T 15) 



> .3308 .9281 (T-15) 



( 1 ) 



If (1) is a lower bound for this problem, it is a fairly tight 
one. This possible lower bound is plotted in Figure 2. Figure 8 
shows the difference between this possible bound and the PND pro- 
duced by the 8-TPMH, 2-TPMH and myopic policies. Figure 8 also 
suggests that increasing m from 1 to 2 resulted in considerably 
more policy improvement than did increasing m from 2 to 8 . 
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T (NUMBER OF TIME PERIODS IN PROBLEM) 

Figure 8. The Difference in PND Produced by m-TPMH Polic 



Appendix A: The Dynamic Programming Procedure for Determining 

Moving Horizon Policies 

We make the following definitions: 

C = set of all cells = {1,2, ...,N} , 

Cj = set of all cells accessible in 1 time period to a searcher 
in cell j , 

qj = P {target detection | target in cell j and search conducted 
in cell j } , 

p^j = P {target transitions in 1 time period from cell i to 
cell j} , 

• r n N xN 

P = target transition matrix = LP^j J € R , 

d n = the cell searched when n time periods remain in the 
problem, 

5 n = (d n , d n _^ , . . . , d^) = an n-time period search path, 

7i j = probability that the target is in cell j , 

tt = (tt^, ^ 2 ' • * ♦ r = target probability distribution over C. 

With any n-time period search path, 6 n , there can be associ- 
ated a vector a € such that a^ = P{ target detection | <5 n is 
followed; target in cell i when search begins}. The probability 
of detection when <$ n is followed and the initial target distri- 
bution is tt is then Tra. Now let A(n,i) be the set of vectors 
associated with all possible 6 n , given the searcher is in cell i 
when n time periods remain. Then the maximum obtainable n-time 
period probability of detection given an initial target distri- 
bution of 77 is 

V ( tt , i ) = max 77a . (Al) 

a 6 A(n,i) 
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And the optimal n-time period search path is that 6 n associated 
with the maximizing a € A(n,i). 

The dynamic programming problem is then to construct the 
vector sets A(n+l,l), A (n+1 , 2) , . . . , A (n+1 ,N) from the vector sets 
A(n,l), A(n,2) , . . . ,A(n,N) . Also, each a € A(n+l,i) must have 

associated with it an (n+1) -time period search path. 

/v n 

Let a be any element of'A(n,j) and 6 be the n-time period 

search path associated with a. Now the N-vector associated with 
the (n+1) -time period search path (j,6 n )- is 



a = e . q . + P . a , 

3 J 3 

N * NxN • , 

where e . ^ R is the j-unit vector and P ^ £ R is P with row j 

multiplied by (l-q^). To see this, the components of a and a 

are interpreted as probabilities of detection when n+1 and n 

searches respectively remain in the problem. The entire set 

A(n+l,i) is then 



{a € R N |a = e^ + P. a ; j € C ± & a € A(n,j)}.(A2) 

The dynamic programming process begins by setting 
N 

A(0,i) = 0 i R , i = 1,2,...,N. One iteration gives the myopic 
solution. Specifically, applying (A2) when A(0,i) = 0 yields 

A (1 , i) = e. q., i = 1, . . . ,N , 

with an associated 1-time period search path of 6^ = d-^ = i. 
Continued application of (A2) allows recursive construction of 
the sets A(n,i) with an n-time period search path associated 
with each vector in each set. 
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The set A(n,i) constructed in this manner from the sets 
A(n-l,j), j €CL, may contain some vectors which will never max- 
imize (Al) for any target distribution it. The 6 n associated 
with each of these "dominated" vectors can not be an optimal 
n-time period search path. To test whether a vector a € A(n,i) 
is dominated, the following linear program is solved: 



min 


x - 


TT^L 


TT, X 






S.t. 


x > 


A 

rra, a € A(a) 


A (n, i) 


tt € n 


A 


less 


the vector a. 



II = { 7T € R N 1 77 . >0 and T. tt . = l}. Whenever the minimal value 
1 i 1 

/N /\ 

of x - ra is non-negative, a is dominated and can be removed 
from A(n,i). Only the non-dominated vectors in A(n,i) need be 
used to construct A(n+l,j). Letting B be the convex hull of 
A (a). Eagle (1982) showed that a is dominated if and only if 
there exists some b € B such that b > a. 



A simpler domination procedure is to remove a from A(n,i) 
wherever there exists a vector a € A (a) such that a > a. This 
method is easier to implement than the linear programming pro- 
cedure, but does not reduce A(n,i) to its minimum size. Thus 
more computer storage is required to save A(n,i) in each stage 
of the dynamic program. 

Once the vector sets A(m,i), i = 1,...,N, have been con- 
structed and a S m has been associated with each a € A(m,i) , then 
the m-TPMH policy is available. Assume n > m time periods remain 
in the problem, the searcher is in cell i, and the target 
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distribution is tt. Then the m-TPMH policy picks as d n the first 
element of <S m , where 6 m is the m-time period search path asso- 
ciated with 



argmax ira . (A3) 

a € A (m, i) 

If the target is not detected in time period n, the target dis- 
tribution given a Bayesian update for the unsuccessful search 
and (A3) is used again to determine d n _^. When the problem 
solution progresses to the point where m time periods remain, 
the m-TPMH policy picks the optimal <5 m for the remaining time 
periods . 
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